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Abstract. We study the formalization of a collection of documents cre- 
ated for a Software Engineering project from an MKM perspective. We 
analyze how document and collection markup formats can cope with an 
open-ended, multi-dimensional space of primary and secondary classifi- 
cations and relationships. We show that RD Fa-based extensions of MKM 
formats, employing flexible "metadata" relationships referencing specific 
vocabularies for distinct dimensions, are well-suited to encode this and to 
put it into service. This formalized knowledge can be used for enriching 
interactive document browsing, for enabling multi-dimensional metadata 
queries over documents and collections, and for exporting Linked Data 
to the Semantic Web and thus enabling further reuse. 

1 Introduction 

The field of Mathematical Knowledge Management (MKM) tries to model math- 
ematical objects and their relationships, their creation and publication processes, 
and their management requirements. In |CF09I 237 ff.] Carette and Farmer 
analyzed "six major lenses through which researchers view MKM": the document, 
library, formal, digital, interactive, and the process lens. Quite obviously, there 
is a gap between the formal aspects {"library", "formal", "digital"} - related to 
machine use of mathematical knowledge - and the informal ones {"document", 
"interactive", "process"} - related to human use. 

In the FormalSafe project |For08| at the German Research Center for Arti- 
ficial Intelligence (DFKI) Bremen a main goal is the integration of project doc- 
uments into a computer-supported software development process. MKM tech- 
niques are used to bridge the gap between informally stated user requirements 
and formal verification. One of the FormalSafe case studies is based on the 
documents of the SAMS project ("Sicherungskomponente fur Autonome Mobile 
Systeme [Safety Component for Autonomous Mobile Systems]", see [F HL + 08j ) at 
DFKI. The SAMS objective was to develop a safety component for autonomous 
mobile service robots and to get it certified as SIL-3 standard compliant in the 
course of three years. On the one hand, certification required the verification 
of certain safety properties in the code documents with the proof checker Is- 
abelle |NPW02| . On the other hand, it necessitated the software development 
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process to follow the V-Model (fig. [TJ. This mandates e.g. that relevant docu- 
ment fragments get justified and linked to corresponding fragments in a succes- 
sive document refinement process (the arms of the 'V from the upper left over 
the bottom to the upper right and between arms in fig. [T|). 

The collection of SAMS documents 
(we call it "SAMSDocs" |SAM09| ) 
promised an interesting case study 
for FormalSafe as system development 
with respect to the V-Model regime re- 
sulted in a highly interconnected col- 
lection of design documents, certifica- 
tion documents, code, formal specifica- 
tions, and formal proofs. Furthermore, 
it was supposed that adding semantics 
to SAMSDocs would be comparatively Fig. 1. Documents in the V-Model 
easy as it was developed under a strong formalization pressure. 

In this paper we report on — and draw conclusions from — the SAMSDocs 
formalization, particularly the formalization of its ETgX documents. In section 
[2] we document the process and detect inherent, distinct formality levels and 
the multi-dimensionality of the formalized structures. Real information needs 
(drawn from three use cases in the SAMS context) turn out in section [3] to be 
multi-dimensional. This motivates our exploration of multi-dimensional markup 
in section [4] Section [5] showcases the feasibility of multi-dimensional services 
with MKM technology enabled by multi-dimensional structured representations 
and section [6] concludes the paper. 
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2 Dimensions of Formality in SAMSDocs 



In this paper, we are especially interested in the question "What should we 
sensibly formalize in a document collection and can MKM methods 
cope?". Note that we understand "to formalize" as "making implicit knowledge 
explicit" and not as "to make s.th. fully formal". 

The SAMS project was organized as a typical Software Engineering project, 
its collection of documents SAMSDocs therefore has a prototypical composition 

of distinct document types like contract, code, or 
manual. Thus, SAMSDocs presents a good base for 
a case study with respect to our question. In fig. [2] 
we can see the concrete distribution over used doc- 
ument formats in SAMSDocs. Requirements analy- 
sis, system and module specifications, reviews, and 
the final manual were mainly written in ETgX, only 
roughly a sixth in MS Word. The implementation in Misra-C contains Isabelle 
theorem prover calls. 

The first stinging, but unsurprising observation was that the level of for- 
mality of the documents in SAMSDocs varies considerably — because distinct 
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purposes create distinct formality requirements. For instance, the contract docu- 
ment serves as communication medium between the customer and the contractor. 
Here, underspecification is an important tool, whereas it is regarded harmful in 
the fine-granular module specifications and a fatal flaw in input logic for a theo- 
rem prover. Since this issue was already present in the set of F>TgX documents, 
we focused on just these. 

For the formalization of this subset in SAMSDocs we used the <jTeX sys- 
tem |Koh08| . a semantic extension of I^TgX. It offers to both publish documents 
as high-quality human-readable PDF and as formal machine-processable OM- 
Doc |Koh06| via MftXML jSKG+lO] . Our formalization process revealed early 
on that previous gTEX applications (based on OMDoc 1.2) were too rigid for a 
stepwise semantic markup. But fortunately, gT^X also allows for the OMDoc 1.3 
scheme of metadata via RDFa |ABMP08] annotations (see |KohlO| V In par- 
ticular, we could 'invent' our own vocabulary for markup on demand without 
extending OMDoc. This new vocabulary consists of SAMSDocs-specific metadata 
properties and relationship types. We call the process of adding this pre-formal 
markup to SAMSDocs (semantic) preloading. Concretely, we extended gT^X 
to STeX-SD (gTEX for SAMSDocs) by adding ^TfiXML bindings for all SAMS 
specific Tf^X macros and environments used in SAMSDocs, thus enabling the 
conservation of the original PDF document layouts at the same time as the 
generation of meaningful OMDoc. 
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Fig. 3. The Formalization Workflow with cJTf^X-SD [ translated by the authors ] 
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Let us look at an example for such an ^T^X extension within our formaliza- 
tion workflow (see fig. [3]). We started out with a TeX document (upper left), 
which compiled to the PDF seen on the upper right. Here, we have a simple, 
two-dimensional table, which is realized with a F>TgX environment tabular. 
Semantically, this table contains a list of symbols for document states with their 
definitions, e. g. "i. B." for "in Bearbeitung [in progress]". As such definition tables 
were used throughout the project, we developed the environment SDTab-def 
and the macro SDdef as ^T^X extensions. We determined the OMDoc output for 
these to be a symbol together with its definition element (for each use of SDdef 
in place of the resp. table row) and moreover, to group all of them into a theory 
(via using SDTab-def). Preloading the TgX table by employing SDTab-def 
and SDdef turned it into an gTEX document (middle of fig. [3]) while keeping the 
original PDF table structure. Using DTeXML on this ^T^X document produces 
the OMDoc output shown in the lower area of fig. [3] 

Mathematical, structural relationships have a privileged state in ^TgX: their 
command sequence / environment syntax is analogous to the native XML element 
and attribute names in OMDoc. Since many objects and relationships induce for- 
mal representations for Isabelle, it seemed possible to semantically mark them 
up with a logic-inspired structure. But in the formalization process it soon be- 
came apparent that (important) knowledge implicit in SAMSDocs did not refer 
to the 'primary' structure aimed at with the use of sTe^- Instead, this knowl- 
edge was concerned with a space of less formal, 'secondary' classifications and 
relationships. Thus, our second observation pertains to the substance of formal- 
izations. Even though we wanted to find out what we can sensibly formalize, 
we had assumed this to mean how much we can sensibly formalize. Therefore, 
we were rather surprised to find distinct formality structures realized in our 
sTeX extension. In the following we want to report on these structures. 

We grouped the macros and environments of gTEX-SD in fig. [5] according to 
what induced them. Particularly, we distinguished the following triggers: 

— " objects" — document fragments viewed as autonomous elements — and 

— their net of relationships via the collection, 

— documents and 

— their organizational handling, and 

— the project itself and thus, its own scheme of meaningful relationships. 

For instance, in the system specification we marked a recap of a definition of 
the braking distance function for straight-ahead driving sq as an object and 
referenced it from within the assertion seen in fig. [4] In the module specification 
sg was then meticulously specified. This document fragment 
is connected to the original one via a refinement-relationship 
from the V-Model, which determined the creation process of 
the collection. Documents induce layout structures like sec- 
tions or subsections and they are themselves organized for 
example under a version management scheme. In the work- 
flow in fig. [3]we already showcased a project-specific element, 
king Distance? ^ e definition table, with its meaning. Interestingly, we can- 
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not compare formality in one group with the formality in another. For example, 
we cannot decide whether a document completely marked up with the object- 
induced structures is more formal than one fully semantically enhanced by the 
version management markup. As these grouped structures only interact rela- 
tively lightly, we can consider them as independent dimensions of a formality 
space that is reified in the formalization process of a document collection. 

Concretely, cJTprX-SD covers the following dimensions and consists of the listed 
extension macros/environments (with attributes in [•] where sensible): 



• SDobjectlid]: Object 

Providing a document fragment structures 
with an identifier ' 

• SDis [id, cat, for, follows, theory, imports, tab]: 

Categorizing an object and relating it to other 
objects 

• SDmore [id, cat, for]: 

Overcoming linearity of documents through 
concatenation 

• SDreferences [id, cd, refid]: 

Referencing an object - via content dictionary 
(theory) and identifier - and reifying the 
document fragment itself 

• SDreferencesNoObj [cd, refid]: 

Referencing an object 

• SDincludes [id, cd, from]: 

Including an object as an exact copy 



• VMchangelist, VMchange: Organizational 

Version management Structures 

• VMcertification, VMcertified: 

Review history management 

• SDnode[id, type], SDpathfJd. from, to]: 

Data flow diagram 



• SemVMrel[id, rel, cd, refid]: 

Relationships within the V-Model 



Collection 
Structures 



Project 

___ _ _~ . , Structures 

• SDTab-def, SDdef: 

Definition tables as described for fig. 3 
■ SDTab-paramUse, SDparamUse: 

Parameter-Use Tables containing 
specification for parameters in a project- 
specific data type 

• SDTab-reqs, SDreq: 

Safety requirement tables with definitions of 
safety requirements and their dependencies 

• SDTab-FMs, SDFM: 

Code error detection tables describing 
errors, their effects, their detection, and their 
induced program error state 



Fig. 5. Formality Dimensions in ^r^X-SD 



Formalizing object structures is not always obvious, since many of the doc- 
uments contain recaps or previews of material that is introduced in other doc- 
uments/parts (e.g. to make them self-contained). Compare for example fig. [4] 
and fig. [6] which are actually clippings from the system 
specification "KonzeptBremsmodell . pdf". Note the 
use of s resp. sg, both pointing in fig. [4] to the brak- 
ing distance function for straight-ahead driving (which 
is obvious from the local context), whereas in fig. [6] s Fig. 6. Yet another 
represents the general arc length function of a circle, Braking Distance s? 
which is different in principle from the braking distance, but coincides here. 

We also realized that ^TeX itself had already integrated another formality 
dimension besides the logic-inspired one, the one concerned with document lay- 
out: A typical document layout is structured into established parts like sections 
or modules. If we want to keep this grouping information in the formal XML 
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Fig. 7. The Document Formality Dimension in ^T^X 



document, we might use gTgX's DCM package for annotating general document 
structures with Dublin Core (cf. |Dub08| ) and similar general-purpose metadata. 
In the cTj^X box in fig. [3] we find for example the command DCMsubsection 
with attributes containing the title of the subsection and an identifier that can 
be used in the usual PXTf^X referencing scheme. 

Finally, we would like to remark that the gTf^X-SD preloading process was 
executed as u in-place formalization" [SIM99 . It frequently considered several of 
the above dimensions for the object at hand at the same time. Therefore, the of- 
ten applied metaphor of "formalization steps" does not mirror the formalization 
process in our case study. We found that the important aspect of the formaliza- 
tion was not its sequence per se, which we consider particular to the SAMSDocs 
collection, but the fact that the metaphor of 'steps' only worked within each 
single dimension of formality. In particular, there is no scale for formalization 
progress as distinct formality levels in distinct formality dimensions existed in a 
document at one point in time. 



3 Multi-Dimensional Information Needs 

We have shown that the formalization of knowledge results in an open-ended, 
multi-dimensional space of primary and secondary classifications and relation- 
ships. But arc multi-dimensional document formalizations beneficial for services 
supporting real users? Concretely, we envision potential questions in the SAMS 
context and services that retrieve and display answers based on the multi- 
dimensional markup of SAMSDocs. 

Let us first take a programmer's perspective. Her main information source 
for the programming task will stem from the module specification. But while 
studying it the following questions might arise: 

(i) What is the definition for a certain (mathematical) symbol 1 ^] 

(ii) How much of this specification has already been implemented? 

(iii) In what state is the proof of a specific equation, has it already been formally 
verified so that it is safe to ground my implementation on it? 

(iv) Whom can I ask for further details? 

Assuming multi-dimensional markup an information retrieval system can 
supply useful responses. For example, it can answer Q if technical terms in 
natural language are linked to the respective formal mathematical symbols they 
represent. For replies to |n]) and |m| we note that, if all collection links are 

3 See fig. 4 and 6 for two symbols having the same appearance but different meanings. 
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merged into a graph, their original placement and direction no longer makes a 
difference. So if we have links from the Isabelle formalization to the respective 
C code and links from this C code to a specification fragment, as realized in 
the V-Model structure of SAMSDocs, we can follow the graph from the specifica- 
tion through to the state of the according proof. Drawing on the V-Model links 
combined with the semantic version management or the review logs, the system 
can deduce the answer for ( |rv| : The code in question connects to a specification 
document that has authors and reviewers. This service can be as fine-grained 
as one is willing to formalize the granularity of the version and review man- 
agement. If we admit further dimensions of markup into the picture, then the 
system might even find persons with similar interests (e.g. expressed in terms 
of the FOAF vocabulary), as has been investigated in detail for expert finder 
systems [SWJLlOj . 

Now, we take a more global perspective, the one of a project manager. She 
might be concerned with the following issues: 

(v) Software Engineering Process: How much code has been implemented to 
satisfy a particular requirement from the contract? Has the formal code 
structure passed a certain static analysis and verification? She does not 
want to inspect that manually by running Isabelle, thus, she needs high- 
level figures of, e.g., the number of mathematical statements without a 
formally verified proof. 

(vi) Certification: What parts of the specification, e. g. requirements, have changed 
since the last certification? What other parts does that affect, and thus, 
what subset of the whole specification has to be re-certified? 

(vii) Human Capital: Who is in charge of a document? How could an author be 
replaced if necessary, taking into account colleagues working on the same 
or on related documents - such as previous revisions of the same document, 
or its predecessor in terms of the V-Model, i. e. the document that is refined 
by the current one? 

Exploiting the multi-dimensionality of formalized knowledge, it becomes obvious 
how the issues can be tackled. 

Finally, we envision a certifier's information needs. For inspection, she might 
first be interested in getting an overview, such as a list of all relevant concepts 
in the contract document. Then, she likes to follow the links to the detailed 
specification and further on to the actual implementation. For more information, 
she likes to contact the project investigator instead of the particular author of 
a code snippet. The certifier also needs to understand what parts of the whole 
specification are subject to a requested re-certification. Her rejection of a certain 
part of a document also affects all elements in the collection that depend on 
it. Again, a system can easily support a certifier's efficiency by combining the 
formalized information of distinct formality dimensions. 

These use scenarios in a Software Engineering project clearly show that multi- 
dimensional markup is useful, since multi-dimensional queries serve natural in- 
formation needs. To answer such queries, we need to represent multi-dimensional 
information in MKM formats. 
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4 Multi-Dimensional Markup 



Structured representations are usually realized as files marked up in formats that 
reflect the primary formalization intent and markup preferences of the formalizer. 
In the evaluation of document formats it is thus important to realize that every 
representation language concentrates on only a subset of possible relationships, 
which it treats with specific language constructs. Note that therefore the for- 
mality space of a semantically enhanced document is very often reduced to this 
primary dimension. On the formal side, for example, a plethora of system-specific 
logics exist. Furthermore, formal systems increasingly contain custom modular- 
ization infrastructures, ranging from simple facilities for inputting external files 
to elaborate multi-logic theory graphs [MML07 . Collections of informal docu- 
ments, on the other side, are often structured by application-specific metadata 
like the Math Subject Classification |Soc09 or the V-Model relations as in our 
case study. 

No given format can natively capture all aspects of the domain via special- 
purpose markup primitives. It has to relegate some of them to other mechanisms 
like the gTgX-SD extension for the formalization of SAMSDocs, if more dimen- 
sions of the formality space than the primary one are to be covered. In represen- 
tation formats that support fragment identifiers — e.g. XML-based ones — these 
relationships can be expressed as stand-off markup in RDF (Resource Descrip- 
tion Framework [RDF04J ). i. e., as subject-predicate-object triples, where subject 
and object are URI references to a fragment and the predicate is a reference to a 
relationship specified in an external vocabulary or ontologjj^] As we have XML- 
based formats for informal documents (e. g. XHTML+MathML+SVG) and for- 
mal specifications (OpenMath or Content MathML), we can in principle already 
encode multi-dimensional structured representations, if we only supply according 
metadata vocabularies for their structural relationships. Indeed this is the basic 
architecture of the "Semantic Web approach" to eScience, and much of the work 
of MKM can be seen as attempts to come up with good metadata vocabularies 
for the mathematical/scientific domain. 

Since RDF stand-off markup is notoriously difficult to keep up to date, RDFa 
[ABMP08J has been developed: A set of attributes for embedding RDF annota- 
tions into XML-based languages, originally XHTML. On the one hand, RDFa 
serves as an enabling technology for making XML-based languages extensible by 
inter- and intra-document relationships. On the other hand, RDFa serves as a ve- 
hicle for document format interoperability. All relationships from a format F that 
cannot be natively represented in a format F' can be represented as RDFa triples, 
where the predicate is from an appropriately designed metadata vocabulary that 

4 The difference between "vocabulary" and "ontology" is not sharply defined. Vocabu- 
laries are often developed in a bottom-up community effort and tend to have a low 
degree of formality, whereas ontologies are often designed by a central group of ex- 
perts and have a higher degree of formality. Here, we use "vocabulary" in its general 
sense of a set of terms from a particular domain of interest. This subsumes the term 
"ontology", which we will reserve for cases that require a more formal domain model. 
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describes the format F. For instance, an OMDoc <theory> clement can be rep- 
resented as <div typeof ="http : //omdoc . org/ontology#Theory"> in 
XHTML, using the OMDoc ontology [LanlO . Support of RDFa relationships 
make all XML-based formats theoretically equivalent, if they allow fine-grained 
text structuring with elements like XHTML 's div or span everywhere (so that 
arbitrary text fragments can be turned into objects). In particular, they become 
formats for multi-dimensional markup as respective other dimensions can 
always be added via RDFa. We have detailed the necessary extensions for the 
OMDoc format in |KohlO| . so that analogous extensions for any of the XML- 
based formats used in the MKM community should be rather simple to create. 

Note that the pragmatic restriction to XML-based representation formats is 
not a loss of generality. In the MKM sphere the three classes of non-XML lan- 
guages used are computational logics, TeX/IATeX, and PostScript/PDF. We see 
computational logics as compact front-end formats that are optimized for manual 
input of formal structured representations; it is our experience that these can be 
transformed into the XML-based OpenMath, MathML, or OMDoc without loss 
of information (but with a severe loss of notational conciseness). We consider 
TeX/LTeX as analogous for informal structured representations; they can be 
transformed to XHTML+MathML by the LTeXML system. The last category 
of formats are presentation/print-oriented output and archival formats where 
the situation is more problematic: PostScript (PS) is largely superseded by PDF 
which allows standard document-level RDF annotations via XMP and the finer- 
granular annotations we need for structured representations via extensions as 
in |GMH + 07 or [EriOf] . But PS /PDF are usually generated from other formats 
(mostly office formats or DTeX), so that alternative generation into XML-based 
formats like XHTML or OMDoc can be used. 

Note as well that a dimension typically corresponds to a vocabulary. In the 
course of the SAMSDocs case study, most vocabularies have initially been imple- 
mented from scratch in a project-specific ad hoc way. But they can be elaborated 
towards ontologies via gTEX and these can be translated to RDF-based formats 
that automated reasoners understand [KKL10 . An alternative is reusing exist- 
ing ontologies. This has the advantage that they are more widely used and thus, 
reusable services may already have been implemented for them. For instance, 
there already exists a vocabulary that defines basic properties of persons and or- 
ganizations: FOAF (Friend of a Friend [BM07J). The widely known Dublin Core 
element set is also available as an ontology |Dub08 . DCMI Terms [DCM08], a 
modernized and extended version of the Dublin Core element set, offers a ba- 
sic vocabulary for revision histories - but not for reviewing and certification. 
DOAP (Description of a Project |DublO| ) describes software projects, albeit fo- 
cusing on the top-level structure of public open source projects. Lin et al. have 
developed an ontology for the requirements-related parts of the V-Model (cf. 
[LFB96 ). Happel and Seedorf briefly review further ontologies about Soft- 
ware Engineering [HS06]. As, e. g. the SAMSDocs vocabularies can be integrated 
with existing ontologies by declaring appropriate subclass or equivalence rela- 
tionships, services can make use of the best of both worlds. 
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5 Multi-Dimensional Services with MKM Technology 



We will now study an avenue towards a concrete implementation of services 
based on the use cases described in sect. [3] to show how MKM technologies can 
cope with multi-dimensional information needs demonstrating their feasibility. 
Concretely, we will study the task of project manager Nora to find a substitute 
for employee Alice. All required information is contained in the gTgX-SD doc- 
uments. To abstract from the particulars of cTeX/OMDoc RDFa encoding - 
e. g. the somewhat arbitrarily chosen direction of the relations or the interac- 
tion of metadata relations with the document and the special markup for the 
mathematical dimension — we extract a uniform RDF representation of the em- 
bedded structures, which can then be queried in the SPARQL language [PS08J. 
Listing 1 1 - 1 1 shows the necessary query in all detail. 

Listing 1.1. Finding a Substitute for an Employee via the V-Model 

# declaration of vocabulary (- dimension) namespace URIs 

PREFIX vm : <http : / /www .sams-projekt.de/ ontologies /VersionManagement #> 

PREFIX omdoc : <http://omdoc.Org/ontology#> # OMDoc 

PREFIX semVM: <http : / /www . sams-pro jekt . de/ontologies/V-model#> 

5PREFIX dc: <http://purl.Org/dc/elements/l.l/> # Dublin Core 

PREFIX xsd: <http://www.w3.Org/2 001/XMLSchema#> # XML Schema datatypes 

SELECT ?potentialSubstituteName WHERE { 

# for each document Alice is responsible for, get all of its parts 
10 # i.e. any kind of semantic (sub)object in the document 

?document vm : responsible < . . . /employees#Alice> ; 
omdoc : hasPart ?object . 

# find other objects that are related to each ?object 
15 # 1 . in that ?object refines them via the V-model 

{ ?object semVM: refines ?relatedOb ject } 
UNION 

#2. or in that they are other mathematical symbols defined in terms 

# of ?object (only applies if ?object itself is a symbol) 
20 { ?object omdoc : occurs InDef initionOf ?relatedOb ject } 

# find the document that contains the related object and the person 

# responsible for that document . . . 
?otherDocument omdoc : hasPart ?relatedOb ject ; 

25 dc:date ?date ; 

vm: responsible ?potentialSubstitute . 

# (only considering documents that are sufficiently up to date) 
FILTER (?otherDocument > " 2 00 9-01-0 1 " A "xsd : date ) 

30 # . . . and the real name of that person 

?potentialSubstitute foaf:name ?potentialSubstituteName . 

} 



In this query we assume that Alice's FOAF profile is a part of our collection, 
having the URI |. . . /employees#Alice| Nora retrieves all documents in the 
collection for which Alice is known to be the responsible person. For any object O 
in each of these documents (e. g. the detailed specification of the braking distance 
function for straight-ahead driving sq from fig.Q, she selects those objects that 
are refined by O in terms of the V-Model (e.g. the general braking distance s). 
Additionally, she considers the mathematical dimension and selects all objects 
that are related to O by mathematical definition, e. g. the braking function that 
uses sg - Of any such related object, Nora finds out to what document it belongs. 
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She is only interested in recent documents and therefore filters them by date. 
Finally, she determines the responsible persons via the version management links, 
and gets their names from their FOAF descriptions. The assumption behind 
this query is that, if, for example, Pierre is responsible for the specification that 
introduces the general braking distance s, which Alice has refined, Pierre can be 
considered as a substitute for Alice. Note that getting the answer draws on the 
collection structures of SAMSDocs (V-Model), on the mathematical structures, 
as well as on the organizational structures (version management). It is easy to 
imagine how additional formality dimensions can be employed for increasing 
precision or recall of the query, or for ranking results. Consider, for example, 
another filter that only accepts as substitutes employees who have never got a 
document rejected in any previous certification. 

The complexity of the query in listing |1.1| is directly caused by the com- 
plexity of the underlying multi-dimensional structures and the non-triviality of 
answering high-level project management queries from the detailed information 
in SAMSDocs. As users like Nora would not want to deal with a machine-oriented 
query language, we have developed a system that integrates versioned storage 
of semantic document collections with human-oriented presentation with em- 
bedded interactive services |DKL + 10| . Thus, the rendered documents serve as 
command centers for executing queries and displaying results^] They provide 
access to queries in two ways: Queries with a fixed structure that have to be an- 
swered recurringly will be made available right in the (rendered) documents in 
appropriate places. This is the case with our employeee substitution query: This 
month, Alice may be ill, whereas next month, Bob may be on holiday. Access 
to this query can be given wherever an employee or a reference to an employee 
occurs in a document. Alternatively, non-prefabricated queries can be composed 
more intuitively on demand using a visual input form. 

These examples show that multi-dimensional queries like the ones naturally 
coming up in Software Engineering scenarios (sect . |3| can be answered with ex- 
isting MKM technology. Moreover, it illustrates that multi-dimensional markup 
affords multi-dimensional services. If we interpret our dimensions as distinct con- 
texts, our services become context-sensitive, as dimensions can be filtered in and 
out. For instance, the context menu of certification documents could be equipped 
with menu entries for committing an approval or rejection to the server, which 
would only be displayed to the certifier. The server could then trigger further 
actions, such as marking the document that contains a rejected object and all 
dependencies of that object as rejected, too. In general, the more dimensions are 
formalized in a document, the more context-sensitive services become available. 



5 In particular, the rendered XHTML+MathML also preserves the original semantic 
structure as parallel MathML markup and RDFa annotations, so that a suitable 
browser plugin can dynamically generate interaction points for semantic services; 
see |KKL10| for details. 



11 



6 Conclusion and Further Work 

In this paper we have studied the applicability of MKM technologies in Soft- 
ware Engineering beyond "Formal Methods" (based on the concrete SAMSDocs 
document collection and its formalization). The initial hypothesis here is that 
contract documents, design specifications, user manuals, and integration reports 
can be partially formalized and integrated into a computer-supported software 
development process. To test this hypothesis we have studied a collection of 
documents created for the development of a safety zone computation, the formal 
verification that the braking trajectory always lies in the safety zone, and the 
SIL3 certification of this fact by a public certification agency. As the project 
documents contain a wealth of (informal) mathematical content, MKM formats 
(in this case our OMDoc format) are well-suited for this task. During the for- 
malization of the DT^X part of the collection, we realized that the documents 
contain an open-ended, multi-dimensional space of formality that can be used 
for supporting projects — if made explicit. 

We have shown that RDFa-based extensions of MKM formats, employing 
flexible "metadata" relationships referencing specific vocabularies, can be used 
to encode this formality space and put it into service. We have pointed out 
that the "dimensions" of this space can be seen to correspond to different meta- 
data vocabularies. Note that the distinction between data and metadata blurs 
here as, for example, the OMDoc data model realized by native markup in the 
OMDoc format can also be seen as OMDoc metadata and could equally be re- 
alized by RDFa annotations to some text markup format, where the meaning 
of the annotations is given by the OMDoc ontology. This "metadata view" is 
applicable to all MKM formats that mark up informal mathematical texts (e. g. 
MathDox |CCB06| and MathLang |KWZ08| ) as long as they formalize their 
data model in an ontology. This observation makes decisions about which parts 
of the formality space to support with native markup a purely pragmatic choice 
and opens up new possibilities in the design of representation formats. It seems 
plausible that all MKM formats use native markup for mathematical knowledge 
structures (we think of them as primary formality structures for MKM) and 
differ mostly in the secondary ones they internalize. XHTML+MathML+RDFa 
might even serve as a baseline interchange format for MKM application^] since 
it is minimally committed. Note that if the metadata ontologies are represented 
in modular formats that admit theory morphisms, then these can be used as 
crosswalks between secondary metadata for higher levels of interoperability. We 
leave its development to future work. 

The formalized secondary formality structures can be used for enriching 
interactive document browsing and for enabling multi-dimensional metadata 
queries over documents and collections. We have shown a set of exemplary multi- 
dimensional services based on the RDFa-encoded metadata, mostly centered 
around Linked Data approaches based on RDF-based queries. More services can 

6 Indeed, a similar proposal has been made for Semantic Wikis |VQ06| . which have 
related concerns but do not primarily involve mathematics. 
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be obtained by exporting Linked Data to the Semantic Web or a company in- 
tranet and thus enabling further reuse. In particular, the multi-dimensionality 
observed in this paper and its realization via flexible metadata regimes in repre- 
sentation formats allows the knowledge engineers to tailor the level of formality 
to the intended applications. 

In our case study, the metadata vocabularies ranged from project-specific 
ones that had to be developed (e.g. definition tables) to general ones like the 
V-Model vocabulary, for which external ontologies could be reused later on. We 
expect that such a range is generally the case for Software Engineering projects, 
and that the project-specific vocabularies may stabilize and be standardized 
in communities and companies, lowering the formalization effort entailed by 
each individual project. In fact we anticipate that such metadata vocabularies 
and the software development support services will become part of the strategic 
knowledge of technical organizations. 

In [CF09I 241] Carette and Farmer challenge MKM researchers by as- 
sessing some of their technologies: "A lack of requirements analysis very often leads 
to interesting solutions to problems which did not need solving". With this paper we 
hope to have shown that MKM technologies can be extended to cope with "real 
world concerns" (in Software Engineering). Indeed, industry is becoming more 
and more aware of and interested in Linked Data (see e.g. |Ser08| and |LDFI 
Question 14]), which boosts relevance to the multi-dimensional knowledge man- 
agement methods presented in this paper. 
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